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Abstract 

This paper deals with four types of point estimators based on minimization of 
information-theoretic divergences between hypothetical and empirical distributions. 
These were introduced 

(i) by Liese & Vajda (2006) and independently Broniatowski & Keziou (2006), called 
here power superdivergence estimators, 

(ii) by Broniatowski & Keziou (2009), called here power subdivergence estimators, 

(iii) by Basu et al. (1998), called here power pseudodistance estimators, and 

(iv) by Vajda (2008) called here Renyi pseudodistance estimators. 

The paper studies and compares general properties of these estimators such as 
consistency and influence curves, and illustrates these properties by detailed analysis 
of the applications to the estimation of normal location and scale. 
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1 BASIC CONCEPTS AND RESULTS 



Let : (0,oo) t— > R be twice different iable strictly convex function with 0(1) = and 
(possibly infinite) continuous extension to t — 0+ denoted by 0(0), and let <fr be the class 
of all such functions. For every G $ we consider the adjoint function 

0*(t) = £0(l/t) where 0* G (0*)* = 0. (1) 

For every G <fr we consider (p-divergence of probability measures P and Q on a mea- 
surable space (X,A) with densities p, q w.r.t. a dominating cr-finite measure A. In this 
paper we deal with P, Q which are either measure-theoretically equivalent (i.e. satisfying 
pq > A-a.s., in symbols P = Q) or measure-theoretically orthogonal (i.e. satisfying 
= A-a. s., in symbols P-LQ). Thus, by Liese and Vajda (1987 or 2006), for all P, Q 
under consideration 

f f<f>(p/q)dQ if P = Q 
D^Q)={ (2) 
[ 0(O) + 0*(O) if P1_Q 

where the range of values is 

0<£^(P,Q) <0(O) + 0*(O) (3) 

and D^(P,Q) = iff P = Q or D^{P,Q) = 0(0) + 0*(O) if (for 0(0) + 0*(O) < 00 iff) 
P-LQ. Another important property is the skew symmetry 

D <t> {Q,P)=D <t> *{P,Q). (4) 

We shall deal mainly with the power divergences 

D a (P, Q) := D(f> a (P, Q) of real powers a G R (5) 

for the power functions Q G $ defined by 

<f> a {t ) = 7 7\ if a(a-l)^0 (6) 

a(a — 1) 

and otherwise by the corresponding limits 

0o(t) = -Int + t- 1, 0i (t) =0o(t) =tlnt-t + l. (7) 
It is easy to verify for all a G R the relation 

0; = 0!_ Q so that D a (Q, P) = £>i_ Q (P, Q). 

For P = Q we get from ©and © - ® 

1 [J(p/ g )«dg-l] if a(a- 1)^0 

D a (p,g) = «{ (8) 

Jln(p/q)dP = D (Q,P) if « = 1 
2 



and for P-LQ similarly 



{l/a(l - a) if < a < 1 
, ( 9 ) 
oo otherwise. 

The special cases D 2 (P,Q) or D\(P,Q) are sometimes called Pearson or Kullback diver- 
gences and D_i(P,Q) = D 2 (Q,P) or D Q (P,Q) = Di(Q,P) reversed Pearson or reverse 
Kullback divergences, respectively. 

The (^-divergences and power divergences will be applied in the standard statistical 
estimation model with i.i.d. observations X±, . . . ,X n governed by Pg Q from a family 
V = {Pg : 9 G 0} of probability measures on (X,A) indexed by a set of parameters 
© C M. d . The parameter 9q is assumed to be identifiable and the family V measure- 
theoreticaly equivalent in the sense 

P e ^ P 6o and P e = P 0O for all 9,9 E@ with 9 ^ 9 . (10) 

Further, the family is assumed to be continuous (nonatomic) in the sense 

P e ({ x }) = for all x G X, 9 G (11) 

and dominated by a a-fmite measure A with densities 

p e = dPe/dX for all 9 G 0. (12) 

In this model the parameter # is assumed to be estimated on the basis of observations 
X\, . . . , X n by measurable functions 9 n : X n \— + called estimates. Collection of estimates 
for various sample sizes n is an estimator. Estimators are denoted in this paper by the 
same symbols 9 n as the corresponding estimates. 

The assumed strict convexity of 4>(t) at t = 1 together with the identifiability of 
#0 assumed in ( FlOl) means that D^Pg, Pg Q ) > for all 9, 9 G with the equality iff 
9 = 9q. In other words, the unknown parameter #o is the unique minimizer of the function 
D^Pg, Pg ) of variable 9 G 0, 

9 = argmin D(P , P 0O ) for every 9 G 0. (13) 

Further, the observations Xi, . . . ,X n are in a statistically sufficient manner represented 
by the empirical probability measure 

1 - 

^ = -E^ ( 14 ) 

i=l 

where P x denotes the Dirac probability measure with all mass concentrated at x G X. 
The empirical probability measures P n are known to converge weakly to Pg as n — > oo. 
Therefore by plugging in ([TBI the measures P n for Pg one intuitively expects to obtain 
the estimator 

n = n ,0 := argming^ (P e , P n ) (15) 
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which estimates 9q consistently in the usual sense of the convergence 6 n — > 6q for n — > oo. 
However, the reality is different: the problem is that for the continuous family V under 
consideration and the discrete family P emp of empirical distributions (fl4"j) for which 

P e ±P n => D^Pe, P n ) = 0(0) + 4>*(0) when P eV and P n G V cmp . (16) 

This means that the estimates 9 n proposed in (fl5j) are trivial, with the argmin = 0. 

In the following two sections we list and motivate several modifications of the minimum 
divergence rule ffl5l) which allow to bypass the problem ffl6l) . Some of them are new 
and some known from the previous literature. We illustrate the general forms of these 
estimators by applying them to the basic standard statistical families and investigate their 
robustness. The model of robust statisticians is richer than the standard statistical model 
defined by the triplet 

{X,A, Q) with Q = VUV cmp 

introduced above. Namely in addition to the hypothesis that the observations X\, . . . , X n 
are i.i.d. by Pq G V the model of robust statistics admits the alternative that the 
observations are distributed by a probability measure Pq V with density 

dP 

dA =Po - 

Throughout this paper we assume that Pq is measure-theoretically equivalent with the 
probability measures from V and we consider the probability measures 

PeV and Q G Q = V + U V cmp where V + = V U {P }. (17) 

Measures P, Q are either measure-theoretically equivalent (if Q G V + ) or measure-theoretically 
orthogonal (if Q G V cmp ). Therefore the (^-divergences D^P.Q) are well defined by (T5]) 
for all pairs P, Q considered in this paper. Further, we denote by L,i(Q) the set of all 
absolutely Q-integrable functions / : X f—>IR and put for brevity 



Q-f = J fdQ for / G Li(Q). (18) 



In the rest of this section we introduce basic concepts and results of the robust statistics 
needed in the sequel. Let us consider the Dirac probability measures 5 X G V emp , x G X 
and denote by C(Q) the set of the convex mixtures 

Q XyE = (1 - e)Q + s5 x for all x G X, Q G Q and < e < 1. (19) 

Further, consider a mapping M(Q,9) : C(Q) <S> © — > K different iable in 6 G for each 
Q G C(Q) with the derivatives 

*(g,0) = ^M(Q,0) (20) 

and let T(Q) G solve the equation V(Q,9) = in the variable 6 G for Q G C(Q). The 
following definition and theorem deal with the general M-estimators 

9 n = axgmm e M(P n ,9) i.e. 6 n = T(P n ) for P n G V cmp . 

Both the definition and theorem are variants of the well known classical results of robust 
statistics, see e.g. Hampel et al. (1986). 
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Definition 1.1. If for some Q G V + the limits 

IF(s; T, Q) = lim T ^ Qe ' x) - T(Q) (21) 

exist for all x G X then ( 1211 is called influence function of the estimator 6 n on X at Q. 
In the following theorem we consider the functions 

i/>(x,6) = V(5 x ,9) (22) 

and assume the existence of the derivatives 



/ d Y t 

ijj(x,9) = — r])(x,9) on X ® Q ( with for transpose) 

\o\ej 

as well as the expectations 

I(Q) = Q-i>(x,T(Q)), QeV + . 



(23) 



(24) 



Theorem 1.1. If the influence function ( 121]) exists then it is given by the formula 

IF(x; T, Q) = - J(Q)- 1 ^(x, T(Q)) (25) 

for the inverse matrix fl24l). 



Proof. By definition of T, for any Q G "P + and Qe.xConsidered i n f[T9l) it holds 
Q E , x -il>(x,T(Q £jX ))-Q-iP(x,T(Q)) 







g-[^(x,T(g £i:c ))-^(x,T(g))] 



+ (4 - Q) ■ ^(x,T(Q £;X )). 



Here 



lim 

ej.0 



Q-[^(x,T(Q eiX ))-^(x,T(Q))} 



Q 



d_Y 



*J>{x,0) 



and 



e=T(Q) 

Q-tJ>(x,T(Q))JF(x;T,Q) 



\im(5 x - Q) ■ ip(x, T(Q £ ^ X )) 

ej.0 



. lim 

elO 



T(Q £ , X ) - T{Q) 



lim [^(x, T{Q £iX )) - Q ■ i(>(x, T(Q £>X ))) 
il>(x, T(Q)) - Q ■ i>(x, T(Q)) = V(x, T(Q)). 
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Therefore we have proved the relation 

= /(Q).IF(x; T, Q) + if>(x, T(Q)) 
which implies f l25j) . ■ 
The estimator n = T(P n ) is said to be Fisher consistent if 

T(P fl ) = for all 9e&. (26) 
In the following Corollary and in the sequel, we put 

lF(x; T, 9) = IF(x; T, Pg) and 1(6) = I(P e ) (cf. (JP). (27) 

Corollary 1.1. The influence function of a Fisher consistent estimator at Q = Pg is 

\F(x; T, 9) = -1(e)- 1 ij>(x, 9). (28) 

2 SUBDIVERGENCES AND SUPERDIVERGENCES 

Throughout this section we use the likelihood ratios £ d § = Pe/p§ well defined a. s. on X 
in the statistical model under consideration, the nonincreasing functions 

(f)#(t) = cf)(t) - t(j)'(t) for every <fr E <f> (29) 

where 0' denotes the derivative of <fi, and we restrict ourselves to the families V such that 

{0 (£ 0t6 ) , 0' (i g ~) , # (e g>s ) } C L X (Q) for all 9, 9 E 6 and Q E Q. (30) 

Obviously, this assumption automatically holds for all Q = P n E V emp . Finally, for all 
pairs 9, 9 E 6 we consider the functions L^(9, 9) = L^(9, 9, x) of variable x E X defined 
by the formula 

L <t> (9,~9) = Pg.<P'(£ d ^+<P#(l^). 

Due to ( |30|) . the functions L^(9, 9) are Q-integrable for all Q E Q. Consider the family 
of finite expectations 

D^(P e ,Q)=Q-L^(9,9)=P e -^(£^) + Q-4 > *(£ e§ ), (P e , Q)eV®Q (31) 

parametrized by ((f), 9) E 3> <8> ©• Broniatowski & Keziou (2006) and Liese & Vajda (2006) 
independently established a general supremal representation of 0-divergences (P, Q) 
which implies the following result. 

Theorem 2.1. For each (Pg, Pg ) E V ® V and E <I>, the 0-diverg ence P)^ (Pg, Poq) is 
maximum of the finite expectations $ (Pg, Pg ) over 9 E 6 attained at the unique point 
9 = 9q. In other words, 

(Pg, Pg ) > Dtf (Pe, Pe ) for all 9,9 eQ (32) 
where the equality holds iff 9 = 9q. 



6 



Proof. For the sake of completeness we present the simple proof of Liese and Vajda. 
For fixed s > 0, the strictly convex function <p(t) is strictly above the straight line + 
4>'(s)(t — s) except t = s, i.e. 

<P(t)>^(s) + ^(s)(t-s) 

with the equality only for t = s. Putting in this inequality t = £g,e , s = £ d g and 
integrating both sides over Pg we get (1321) including the iff condition for the equality. ■ 

Theorem 2.1 implies the formula 

£> (Pg, Q) = max D . 9 (Pg, Q) for all (P e , Q) G V ® V (33) 

which justifies us to interpret D^g (Pg, Q) as subdiver gences of Pg, Q with parameters 

((f), 9) e $® e. 

Now we introduce the family of suprema 

% (P e , Q) := sup D^g (Pg, Q) for all (P e , Q)eV®Q (34) 
flee 

parametrized by <p G 3>. This family extends the 0-divergences (P, Q) from the domain 
V <8> V to V <8> Q. Indeed, by Theorem 2.1, 

D*(P e ,Q)=£V(P,,Q) for all (P e ,g)GP®P. (35) 

This justifies us to interpret (Pg, Q) as super diver gences of (Pg, Q) G V <E> Q with 
parameters G <&. 

Note that (1331) need not hold for g ^ P because if Q = P n G P em p then the super- 
divergence values (Pg, P n ) differ from the constant divergence values (Pg, P n ) = 
0(O) + 0*(O) (cf. (USD). 

The sub divergences (Pg, P n ) and super diver gences (Pg, P n ) can replace the 
divergences (Pg, P n ) as optimality criteria in definition of M-estimators. Let us consider 
the families of functionals T^g : Q \— > 9 and : Q \— > 9 defined by 

%>(Q) = argmax e - (P e , Q) for (0, 0) G ® (36) 

and 

T^Q) = argmin, D (P e ,Q) for G $ (37) 

respectively. Replacing the general argument Q by P n defined by (TT4"|) we obtain the 
maximum subdivergence estimators (briefly, the maxD^-estimators) 

O^g^^Pn) = argmaxg § (Pg, P n ) (38) 
=^rgmax e - [Pg ■ <j>'(£ eJ} ) + P n ■ # fe)] (cf. flM}) 



=argmaxg 



(39) 
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with escort parameters 9 G @, and the minimum super divergence estimators 

(briefly, the minD^-estimators) 

O^T^Pn) = argmin, B> {P g , P n ) = argmin,sup,~ {P e , P n ) (cf. flS}) (40) 
^rgmin,sup,~ [P„ • <j>'{l gJ§ ) + P n • # (^~)] (cf. (EI])) 



=argmm e supg 



(41) 



Theorem 2.2. The maxD^-estimators are as well as the minD^-estimators are Fisher 
consistent. 



Proof. By (]33j) and (135]). 

f^,e{Pe ) = argmax,- {Pg, Pg ) for (0, 0) G $ <g> (42) 

and 

7*(P flo ) = argmin, D (P e , P &0 ) for G $ (43) 

which completes the proof. ■ 

The minD^-estimators were proposed independently by Liese & Vajda (2006) under 
the name modified (^-divergence estimators and Broniatowski & Keziou (2006) under 
the name minimum dual (^-divergence estimators . The maxD ^-estimators were pro- 
posed by Broniatowski and Keziou (2009) and called dual ^-divergence estimators by 
them. Both types of these estimators were in the cited papers motivated by the mentioned 
Fisher consistency and by the property easily verifiable from (139]) and (T4T]) . namely that 
(p(t) = — hit implies 

9<t>,e,n = argmaxg E™ =1 lnp^Xj) and 0^ n = argmaxg £™ =1 h\pe{Xi) (44) 



where the left equality holds for all escort parameters 9 G ©. In other words, the log- 
arithmic choice (p(t) = — hit reduces all the variants of the maxD ^-estimator as well as 
the minD^-estimator to the MLE. It is challenging to investigate the extent to which 
the maxD^-estimators 9^^^ and the minD ^-estimator #<^ n as extensions of the MLE are 
efficient and robust under various specifications of <p, 9 and (f) respectively. 



In this paper we restrict ourselves to special subclasses of the power divergences 
D a (P,Q) := D^(P,Q) defined by © - flS} . For the power functions <p a from ©, © 
we get the functions 



t a -t 



_ | for a 7^ 1 

Mt) :=*(«) = <! Z,e* = mt s»o = i (45) 



a-l 



and 



J (1 - t a ) for a ^ 

lim„_ n 1 (1 - t a ) = -lnt for a = 0. 



4>*(t) = fa® ~ = { w.,_n (46) 



s 



They lead to the maxE^-estimators (briefly, power subdivergence estimators) 



O a ,e,n = argmaxg 



{-\-Pn-0i (P6 



(47) 



with power parameters aeK and escort parameters 9 G and to the minDc-estimators 
(briefly, power super divergence estimators) 



= argmin e sup e - 



+ Pn-J>t 



with power parameters a G R. If the argmaxima in (T4T1) exist then 



e, 



an = argmin fl 



Pa 



+ P 



n ^ a 



Pe 



(48) 



(49) 



The next two subsections deal correspondingly with the maxD^-estimators and minD Q - 
estimators. In both sections are considered the power parameters a > 0. Since <f>a{t) = 
— hit, we see from (14*41) that 



6 ,e,n = argmaxg Ef =1 In pg(Xi) and o , n = argmax e S" =1 \np e (Xi 

are the MLE's. If a > then by (1451) - fl4H|) . 

a ,e,n = argmin - M afi (P n ,0) 

and 



9 a>n = argmax infgj M afi {P n ,0) = argmax e M afi (P n ,Q afi , n ) 



(50) 

(51) 
(52) 



where 



M a , e (Q, 



1 — a 



-Ps 



a \p§ 



Pe Pe 



if a > 0, a 7^ 1 



if a = 1 



(53) 



for all Q E Q. 

Throughout both subsections we restrict ourselves to the densities pe twice differen- 
tiable with respect to 9 G C M. d , we put 



s e = — Inpe and s g 
dO 



dO 



(54) 



and suppose that the functions M a! g(Q,8) of (1531) are twice differentiable in the vector 
variable 8, with the differentiation and integration interchangeable in f[53|) . Moreover, we 
suppose that the derivatives 



9ajB{Q,0) = ^M ajB {Q,e) = P r [^) ^-Q-(-) s § . (55) 



admit solutions of the equations ^ a ,e(Q, &) = in the variable 6 G for Q G 
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2.1 Power subdivergence estimators 



In this subsection we study the maxD a -estimators aj g >n with the divergence power param- 
eters a > and the escort parameters G 6. As said above, for a = they coincide with 
the MLE's (I50p . Therefore we restrict ourselves to a > and to the definition formula 

H5XD, CE3D- 

By assumptions, the argminima 

T a ,e(Q) = argmimj M a;d {Q,8), a > 0, QeQ (cf. ([36])) (56) 

solve the equations ^a,e(Q,Q) — in the variable G 6 and, in particular, 8 a) g >n = 
T a fi{.P n ) are for all a > solutions of the equations 

in the variable 9 G 0. 



Theorem 2.1.1. The influence functions of the maxD Q -estimators 9 a ^ <n under consid- 
eration are at Pg given by the formula 



IF(x;T at g,6o) = Ia t g(9 



-i 



w . s eo (x)-P 9o .{-) s eo 



if a > (58) 



IF(x;T 0) g,9o) = I(9 ) sg (x) otherwise (59) 

where 

I a ,e{e Q ) = P 9o ■ (—) sl o s 0o if a > (60) 

\P0O J 

I(9 ) = P eo -s t eo s dQ if a = 0. (61) 

If the escort parameter coincides with the true parameter O then 

IF(x; T ai e , O ) = l(0o) -1 se o (a;) for all a > 0. 

Proof. By (j22j) and (13311. 

^, e (x, 0) = ^ aj6 (S x , 0) = P e ~ • (^) s s - 5 X ■ (^-) s § (62) 

\PgJ \PgJ 

and under the assumptions stated above 

0) = Ql) <M*> *) = ^ ■ (^) " ^ ■ K,e,e + KeA x ) ( 63 ) 



for 



A aA e(x) = (j^jj [asj(a;) t S|(z) - s e -(x)] . 
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Further, by (|27|), (|2ij) and (|63|), 

\POoJ 

and (1251) leads to the influence functions 

lF(x;f at g,9 ) = -I a ,e(0o)~ 1 4'a,e(x,9o). 

The substitution from (1621) yields the desired formula (I58|) . In the MLE case a = we 
get for all escort parameters 9 the classical MLE influence function (I59p with the classical 
Fisher information matrix given in (16 ip . This influence function is obtained also if the 
escort parameter 9 coincides with the true parameter 9q as in this case the estimators 
with all power parameters a > reduce to the MLE (cf. fl50|) ). ■ 

Next follow special examples of the influence functions (l58j) . (1591) . 



Example 2.1.1: Power subdivergence estimators in normal family. Let the 

observation space (X, A) be the Borel line (M, £>) and V = {P^, a : /i G M, a > 0} the 
normal family with parameters of location /i and scale a (i.e. variances a 2 ). We are 
interested in the maxD a -estimates (p, Q ^,a,n, Va^^n) with power parameters a > and 
escort parameters (//,cr)eR®(0,oo)}. 

If a = then these estimators reduce for all escort parameters /i, a to the well known 
MLE's 



(Ao,M, 



kt*> 

i=l 



\ 



For < a < 1 the function ([53]) takes on the form 



where 



and 



PnA x ) 



1 — a 



P- - ■ 

1 W 



Pp,,S 



+ -Q 

a 



Pn,a 
Pji,& 



p»A x ) 



a 

— I exp 

a 



a (x — /i) 2 a (x — fx)" 



P, 



Pn,a 
PM 



2a 2 
r.\2 



exp 



a(l — a)(fx — ji) 
2[aa 2 + (1 - a)a 2 ) 



-In 



2a 2 

a/ ad 2 + (1 — a)a 2 



a a a x - a 



Using the likelihood ratio function ( 1661) and the score function 

2 



S /J,,(j{ X ) 



X — /i 1 



one obtains for all a > the derivative 



x — jJ, 



a 



1 



(64) 



(65) 



(66) 



(67) 



(68) 
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and the maxE^-estimators as the argminima 



1 p ( Pbz 

1 - at \Pfi,a 

or, equivalently, as solutions of the equations 



n 

-E 

an z — ' 



1 \pw(Xi) 



(70) 



fl,(T 1 I "/i. 

\Pfi,ff I 'I 



By Theorem 2.1.1, the influence functions of these estimators at P^^o are 
IF(x; f a „ ff , /i , o-q) = IuAl^o, o-q)' 1 



(71) 



a fJ.0,cr K^J 1 M0,O"0 



for 



PlM) 



O"0 



S /Lt0,<70 S M0,O-0- 



(72) 
(73) 



Example 2.1.2: Power subdivergence estimators of location. Let in the frame 
of previous example = {P M : G R} be the standard normal family with the location 
parameter \x and scale a = 1. Then the function (1651) takes on the form 



1 — a 



(74) 



for a > 0, a 1 where 

VaA x i ft) = ex P ~ + — 2a; )/ 2 } > £ G R. 

The maxD Q -estimates jl a ^ n of location /x with the divergence parameters < a < 1 and 
escort parameters /iEM are the MLE's 



1 n 

n = — / Xj 

i=l 



if a = 0. Otherwise they are the minimizers 

AW,n = argmin A M a ^ (P„,/i) 

or, equivalently, solutions of the equations 

*a, M (Pn,A) =0 

in they variable /2 G R for 



(75) 



(76) 



(77) 
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Let T a ^{Q) be the solution of the equation \& aiAe ((3, P) = in the variable jl 6 R and 
let Q/i denote the shift of the distribution Q by /io. Then 

<5 M0 • {jl - x)rj a ^(x,ji) =Q ■ (jl- /i - x)?7 ai/ ,_ M0 (x,/i - /i )) 

so that T a41 {Q^ ) = /i + T a/J „ At0 (Q). This means that the estimators (176|) are Fisher 
consistent in the normal family P CT = {P^^ = N(fi , a 2 ) : /i G M} with a > fixed if 
and only if the solution T a At (P (T ) of the equation 

Po,a ■ (fi - x)rj atlx (x,jl) - a(jl - y)r]^{^p) = (78) 

in the variable jl satisfies the condition 

f a ^(P ,a) = for all //eR. (79) 



By evaluating the function P ,o- • (fi — x)r] a ^(x, p) of variables a, /i, jl and inserting it in 
(17H1) . one can verify that ( 1751 holds if and only if a = 1. The "if part follows from the 
Fisher consistency of T a ^ established in Theorem 2.2 which implies 

f a ,^P ,x) = f a APo) = for P ,i = Po e V and all fi e R. 

However, the "only if assertion is new and surprising in the sense that it indicates a 
relatively easy loss of consistency of the maxD a -estimators. 



Problem 2.1.1. It remains to be verified analytically or by simulations whether the 
estimators jt a x n n with the adaptive MLE escort parameters X n are Fisher consistent 
under all hypothetical models P^ a = N(fi,a 2 ), a > or, more generally, whether the 
adaptive estimators 

a , Tn ,n with the MLE escorts r n = 8 0tn given by flUD (80) 

are Fisher consistent under the hypothetical models Pq , and eventually consistent and 
robust under contaminated versions of these models. 

Let us turn to the influence curves lF(x; T a>II , // ), < a < 1 at the data source P Mo . 
Here (x)s IM) (x) = s^ Q (x) = (/i — x) 2 so that, by (12"7|) and (175|) . 



' /" Qiq - a:) 2 exp (-^ ~ ^ + (1 ~ a){x - ^ } dx (81) 



^ J { 2 

a (a — l)(/io — /^) 2 



[l + a 2 ( / u -/u) 2 ]exp|- 



If we put 
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then, by f!72|). 



IF(x;T a ^,/j ) 



[1 + a 2 (/x - /i) 2 ] e^-Wn-tfP ' 



(82) 



This formula remains valid also for a = because then it reduces to the well known 
influence function 

IF(x;MLE,ii ) = x - /x 

of the MLE = To )At which is not depending on the escort parameter /x. We see that 
the influence curve ( 182]) is unbounded for all /x, /xq G R and < a < 1. For < 
a < 1 and the escort parameters /x different from the true /xo the influence functions 
lF(x; T a ^ /x ) contain the constant terms IF(/x ; T QjAt , /x ) 7^ and, moreover, increase to 
infinity exponentially for x — > 00 or x — > — 00. Therefore T QjAt are strongly non-robust. 



Example 2.1.3: Power subdivergence estimators of scale. Let in the frame of 
Example 2.1.1, V = {P a '■ o > 0} be the standard normal family with the location 
parameter /x = and scale a and let us consider the maxD^-estimators (j a ,a,n of scale uo 
with the divergence parameters < a < 1 and escort parameters o > 0. For a = they 
reduce to the standard deviations 



Co,, 



1/2 



and otherwise they are of the form 

v a ,a,n = T ajCT (P n ) for T a>(r (Q) = argmin~M ai(7 (<2, ct), Q e Q 

where 

M a!a (Q,d-)=M a , a (Q,a/cr) 



for (cf. (ESD) 

M«, ff (Q,s) 



(1 - a) Vas 2 + 1 - a i a 



+ / — exp 



ax 2 \s 2 



1] 



2a 2 



dQ(x). 



Put in accordance with (1221) and (1621) 

^ a ,a(^,c) = -^M atCT (5 x , a) = - \^-M a ^{8 x ,s) 
dcr a \ as 



s=<r/a 



1 

a 



0-1 



(as 2 + 1 — a 



,3/2 



+ 



a fa 2 - a 2 ) 



\as 



X 



[aa 2 + (1 — a)a 



21 3/2 + a 



e «a; 2 [s- 2 -l]/2o- 2 

s=(7 /a 



(83) 
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By differentiating this expression with respect to a and using fl24l) we obtain the matrix 
T /-x , /T ,N /^V" 1 2a 4 + a 2 (d 2 - a 2 ) 2 

IaA°) ■= I«AP*) = - ^ , 77 ,x 9i B /, - (84) 



(J / 0\OLO 



+ (l-o;)a 2 ] 5 /2- 



Hence, by Theorem 2.1.1, the influence function of maxD Q -estimators at the data gener- 
ating distributions P UQ are for all < a < 1 

IF(x;T aitr ,a ) = 



A ^ (X ' ao) + 2^ + ^K-aT (85) 



where 



K + (1 - «)a 2 ] 5/2 [(x/a ) 2 - 1] exp [a Q - 2 - a' 2 ] /2} 
Aq - (X; ao) " a [2a 4 + a 2 (a 2 - **)*] /a ' (86) 

This formula remains valid also for a = since in this case (185]) reduces to the well known 
influence function 

(To \ (x/a ) — ll 
IF(x; MLE, a ) = lV ' ° J } - 

obtained from the limit values 

ipo,a{ x i a o) = ~ [( x / a of ~ 1] /&Q and / ,«r(cr) = 2/o- 2 

which do not depend on the escort parameter . We see from the formula (|86[) that the 
influence curve is unbounded for all a, cxo > and a > 0. For a > and a ^ a 
we get IF (cr ; T Qj(J , cr ) 7^ 0. If moreover a < er then IF(x; T Qj(J , cr ) increases to infinity 
exponentially fast for |x| — > 00. Thus T Qj(J with a > and a ^ are strongly non-robust. 

Example 2.1.4: Power subdivergence estimator in Pareto family. It is hard to 
find simpler nontrivial examples of the maxD Q -estimators than the estimators of location 
(1751) . (I76p from Example 2.1.2. Another relatively simple example is the family of maxD Q - 
estimators in the Pareto model with the family of measures V = {Pg : 6 > 0} defined on 
the interval X = (1, 00) by the densities 

Pe(x) = (87) 

with the mean values finite equal 6/(6 — 1) in the domain 6 > 1 and variances finite and 
equal 6/[(6 — 2) (6 — l) 2 ] in the domain 6 > 2. As before, the estimates 6 a ,e,n depend on 
the divergence parameters a > and escort parameters 6 > 0. By ( 1501) . for a = we get 
the MLE estimates 



( 1 n 

6 ,e,n = argmaxg ln^pQ) = - ^ In X, 



n 

i=l 
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For < a < 1 we can use the criterion function 

M a ,o(Q,e) = T ^P r (^y + -Q-['-) . Q e Q (ss) 
of (1581) . or its derivative 



d# 

given by (1551) . where in the present situation 

/p e (x)\ Q g°g^ 1 
ifl • — r = ~, and sg(x) = — — Inx. 

Substituting these expressions in (1851) . (159]) we get the desired asymptotic characteristics 
of the maxD a -estimators Q a fi,n obtained as argminima of the functions M aj g(P n ,9) or, 
equivalently, as solutions of the equations ^ a ^(P n ,9) = in the variable 9. Further, by 

VPs/ \P§{ X ) J 

and using Theorem 2.1.1 one easily obtains the influence functions of the estimators 9 a fi, n 
under consideration. 



2.2 Power superdivergence estimators 

In this subsection we deal with the minD Q -estimators 9 a}Jl with the power parameters 
a > 0. For a = they coincide with the MLE's (150|) . Therefore we consider a > 
when these estimators are defined by (152|) and (1531) . Restrict ourselves for simplicity to 
< a < 1 and denote the function ty at g(Q, 9) from ( l55l) in previous subsection temporarily 
by y a>e {Q,9), i.e. let 

* a ,e(Q, 9) = P § ■ {^j a s- e - Q • (g) ° 

Further, let T at g(Q) be solution of the equation ^^(Q, 6 1 ) = in variable 6 1 , i.e. 

^, e (Q,T Q , e (Q)) = forall^G©. (90) 

Finally, let M a fl(Q,T a j(Q)) be the function of variable 9 G © obtained by inserting 
# = T a fi{Q) in the function M a j(Q,9) defined in (153|) . According to (152|) and (1531) . the 
maximizers 

T a (Q) = argmax, M a>e (Q, f a>e (Q)) (91) 

generate the minD Q -estimators 9 a>n under consideration in the sense that 9 a;H = T a (P n ). 

In the following theorem we consider the score function sg = pg/pg and we put for 
brevity f a>e = f afi {Q). 
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Theorem 2.2.1. For all < a < 1 the maximizers (|9T|) solve the equations ^ a (Q, 9) = 
in variable 9 G @ for the function 

*a(Q^4^(Q,v) = T fi'f M -(-) s^ + O-f— ) s«- (92) 

Consequently the corresponding minBvestimators 9 a , n = T a (P n ) are solutions of the 
equations 



Proof. By ((53 



l „ l 



M a , e (Q,0) = - P§-[-) +-Q- 



so that 



a r> f Po V , ^ ( Pe ' ' 



1-a a ' V^W VP^.e 



dr '7. d9 



T=T a 



1 — a 

Using Q2QD we obtain (152]) and (j53jl . ■ 

Corollary 2.2.1. The influence functions IF(x; T a , #) of all minD a -estimators 9 a n = 
T a {P n ) with power parameters < a < 1 at P d <EV coincide with the influence function 

W(x;T ,9) = I{9)- 1 s e {x) (cf. (J27D and flU) (94) 

of the MLE 9 0:n = T (P n ). 

Proof. By Theorem 2.2, the maxD Q -estimators 9 a ^ n = T a) g(P n ) are Fisher consistent. 
Hence for Q = Pg we get f a> e := T a> g(Pg ) = 9 in fl92|) . Consequently it follows from (1221) 
and (19"2l that the -0-functions 

a (s, f a ,,) = f Q)0 ) = r^-P^, " ( S 6o + 5 X ■ (^-) S 6o 

1 a \Pr ate / \Pr at e J 

of these estimators reduce for all < a < 1 to the score function sg (x) which is the 
^-function of MLE T . Similarly, we get from (|2"7|) and fT2~4"l) for all < a < 1 the matrix 
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I (do) — Pe ■ Sg se corresponding to the MLE. Therefore the influence functions of all 
minD a -estimators under considerations reduce to the influence MLE function AMI) which 
completes the proof. ■ 

Formulas for the minD Q -estimators of the normal location and/or scale are seen from 
the examples of Subsection 2.1. 

3 DECOMPOSABLE PSEUDODISTANCES 

The (^-divergences D,p(P,Q), G 3? can be characterized by the information processing 
property, i. e. by the complete invariance w.r.t. the statistically sufficient transformations 
of the observation space (X, A). This property is useful but probably not unavoidable in 
the minimum distance estimation based on similarity between theoretical and empirical 
distributions. Hence we admit in the rest of the paper general pseudodistances ®(P,Q) 
which may not satisfy the information processing property. 

Definition 3.1. We say that D : V ® V + h- > R is a pseudodistance of probability 
measures P G V = {P e : 9 G 0} and Q G P + if 

®(Po, P§) > for all 0, 9 G 6 with D(P e , P s ) =0 iS 9 = 9. (95) 

An additional restriction imposed in this section on pseudodistances D(P,Q) will be 
the decomposability. 

Definition 3.2. A pseudodistance D on V <S> V + is a decomposable if there exist 
functionals D° : V i— > R, X) 1 : "P + i— > R and measurable mappings 

p e : A? i— >R, 9 E Q (96) 

such that for all # G 6 and Q G P + the expectations Q • pe exist and 

®(P e , Q) = D°(P e ) + ®\Q) + Q • p e . (97) 

Definition 3.3. We say that a functional T s : Q i— ► for Q = P + U Pemp defines 
a minimum pseudodistance estimator (briefly, min£)-estimator)if D(Pg,Q) is 
a decomposable pseudodistance on P + and the parameters T®(Q) G minimize 
D°(Pe) + Q ■ pe on Q, in symbols 

Ta,(Q) = argmin e [®°{P e ) + Q ■ pe] for all Q G Q. (98) 

In particular, for Q = P n e V emp 

0®, n ■= T®(P n ) = argmin 



1 - 

V°(Pe) + -J2Pe( X 
n z — ' 

i=l 



if P„ 



i=l 



(99) 
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Theorem 3.1. Every min 2)-estimator 



9®,n = argmin 6 



n 

d°(p,) + - y>(*, 



n 
1=1 



(100) 



is Fisher consistent in the sense that 

r s (P eo ) = argmin e D(P e , Pg ) = 9 Q for all 9 G 0. (101) 

Proof. Consider arbitrary fixed 9$ G 0. Then, by assumptions, S) 1 (Pe ) is a finite 
constant. Therefore (I9"8"j) together the definition of pseudodistance implies 

T 3 (P 0O ) = argmim, [D°(Pg) + Q ■ pg] 

= argmin, [D°(P e ) + ®\Pg ) + Q ■ Pe] 
= argmin e D (P e ,Pe ) = #o- 



The decomposability of pseudodistance D(Pg, Q) leads to the additive structure of the 
criterion 

1 - 

£(P e , P n ) ~ D°(P e ) + P„ ■ p, = D°(P e ) + - V p,(X 4 ) (102) 

n L — ' 

i=i 

in the definition fllOOp of the min ©-estimators which opens the possibility to apply the 
methods of the asymptotic theory of M-estimators (cf. Hampel et al. (1986), van der 
Vaart and Wellner (1996), van der Vaart (1998) or Mieske and Liese (2008)). 

The general min SD-estimators and their special classes studied in Subsections 3.1, 
3.2 below were introduced in Vajda (2008). They contain as a subclass all the maxD^- 
estimators of Section 2. To see this suppose that the assumptions of Section 2 related to 
the estimators (j!04p hold and consider for arbitrary fixed (0, r) G 3> ® the well defined 
expressions 

VIAP.) = - K ■ « (|) , ^ = - rf» g) 

and 

©J, T (Q) = - inf [DJ >T (P ) + Q ■ p^e] ■ 

u 

Theorem 3.2. The sum 

D(P 9 , Q) := Dj, T (P 9 ) + ©J jT (Q) + Q • p* )T , fl (103) 
is a pseudodistance on P®P + and the maximum subdivergence estimator 

'Pr{Xi 



9<t>, T ,n = argmax e 



(104) 



of Section 2 with the divergence parameter <p G «fr and escort parameter r G is the 
min D-estimator for the decomposable pseudodistance (11031) . 
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Proof. Fix ((f), r) G <fr (g> and let the assumptions of Section 2 related to the estimators 
PUD hold. Then for any fl e6 



D(p, ,g) = 2)J iT (p, ) + g 

If Q &V then, by (JSH) and (133]) . 



inf [2)J, T (PO+G-^r,flb] >0. 



®<t>A p 8o,Q) = sup 



P 9n .^(M +Q .^(M|_P 9n .^('M +Q . »('M 

\PeJ \PeJ\ \PeJ \Pe J 



= D^Pe ,Q)-D^ T (Pe ,Q). 

By Theorem 2.1, this difference is zero if and only if Q = Pg which proves that fl 1 3 [) is 
pseudodistance on V®V + . On the other hand, obviously, (11041) satisfies 

0<l>,T,n = argmin e [D° r (P e ) + P n ■ p^e] 

so that it is min ©-estimator for the pseudodistance (I103P which completes the proof. ■ 

The minimum superdivergence estimators 9^^ of Section 2 (the minD^-estimators) 
minimize the suprema 

sup ®(P e ,Q) for Q = P n 

T 

of the decomposable pseudodistance fjl03[) . However, the suprema of decomposable pseu- 
dodistances are not in general decomposable pseudodistances. Therefore the standard 
theory of M-estimators is not applicable to this class of estimators. An exception is the 
MLE 00 o , n obtained for the logarithmic function 0o given in (J?]). 



3.1 Power pseudodistance estimators 

In this subsection we study a special class of pseudodistances D^(Pe, Q) defined on V®V + 
by the integral formula 

®i>(Pe, Q) = J V(P* l) dA for p = q = ^ (105) 

where ip(s,t) are reflexive in the sense that they are nonnegative functions of arguments 
s,t > with ip(s,t) — iff s — t. If a function if) is reflexive and also decomposable in 
the sense 

il>(s,t) = ip°{s) + ip 1 (t) +p{s)t, s,t>0 (106) 

for some ip° , ip l , p : (0, oo) — > R then the corresponding ?/>-pseudodistance (11051) is a 
decomposable pseudodistance satisfying 

T>4P e , Q) = ®°(P e ) + (Q) + Q ■ Pe (cf. (107) 

for 

®°(n)= / V>°(P9)dA, Z>J(Q) = J ^(q)d\ and p e = p(p e ). (108) 
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Example 3.1.1. The 0-divergences D^(Pg,Q) are special ^-pseudodistances (11051) for 
the functions 

ij)(s,t) = <j>(s/t)t-<j>'(l)(s-t), s,t>0 (109) 

since they are nonnegative and reflexive, and (I109P implies D^(Pq, Q) = D^Pq, Q) for all 
P G V,Q G V + when <ft G $ and ip are related by (11091) . However, the functions (11091) in 
general do not satisfy the decomposability condition (11061) so that the 0-divergences are 
not in general decomposable pseudodistances. An exception is the logarithmic function 
<f) = <po defined in ([7j) for which the min3)^ -estimator is the MLE. 

Example 3.1.2: /^-estimator The quadratic function ip(s, t) = (s—t) 2 is reflexive and 
also decomposable in the sense of (11061) . Thus it defines the decomposable pseudodistance 



Z>Tp{P e , Q) = J{ve - qf dA = \\p e - q\ 



2 



on V <g> V + for V + C £2 (A). It is easy to verify that the decomposability in the sense of 
(TTUTj) holds for 

®l(P e ) = Jpld\ V\{Q) = j q 2 d\ Q , and p e = -2 Pe . 
The corresponding min 3)^-estimator defined by (llOOp is in this case the L 2 -estimator 

f 2 " 
/ PedX 22p e (X, 

J n i=l 



9 n = argmin e 



(110) 



which is known to be robust but not efficient (see e.g. Hampel et al. (1986)). 



To build a smooth bridge between the robustness and efficiency, one needs to replace 
the reflexive and decomposable functions ip by families {ip a : a > 0} of reflexive functions 
decomposable in the sense 

ip a (s, t) = ifj° a (s) + ipl(t) + Pa (s) t for all a > (cf. §MD) (111) 

with the limits at satisfying for some constant x all s > the conditions 

ipo(s) = lim^°(s) = x s and lim p Q (s) = po(s) = — Ins. (H2) 

Then for all a > and (Pg, Q) G V ® V + the family of ?/> a -pseudodistances 

® a (Pg,Q):=® fa (P ,Q), a>0 (113) 
satisfies the decomposability condition 

T) a (P e , Q) = 5)°(Q) + 3)i {P e ) + Q ■ Pa ,e (cf. JSZH) (114) 

for 

®° a (P e ) = J^° a (Pe)dX, ®l(Q) = J^ a (q)d\ and p a>e = p a (p g ). (115) 
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In other words, the pseudodistances D a (Pe,Q) defined by (I113P are decomposable and 
define in accordance with (UOOp the family of minS^-estimators 



9 a>n = argmin e [£>° Q (P e ) + P n ■ p a>g ] 



argmm e 



1 - 

i=l 



a > 0. 



(116) 
(117) 



Here (11121) guarantees that this family contains as a special case for a = the efficient 
but non-robust MLE 



9 , n = argmin e 



1 n 

const y lnp e (Xj) 

n ' 

i=l 



(118) 



while for a > the 9 a ,nS are expected to be less efficient but more robust than 6* ,n- 

The rest of this subsection studies special family of decomposable pseudodistances 
^a(Pe, Q)- It is defined on V ® Q in accordance with f 1 1 1 3 j) and (I105P by the functions 



ip a (s,t) = t 



l+a 



'" ■'!-:-,, I - I + (1 - at) 



tJl ' 



a > 



(119) 



of variables s, t > where (pi +a and 4> a are the power functions defined by (Q, ([7]). These 
functions satisfy fillip . (11121) as it is clarified by the next theorem. In this theorem and 
in the sequel we use for the function f 1 1 1 9 j) the relations 



1p a (.S,t) 



l + a 



+ t 



l+a 



a l + a 



ts a 
a 



(120) 



when a > and 



s l+a _ t l+a ft a -l S a - 1 



l + a 



+ t 



a 



a 



ipo(s, t) = s — t + t\nt — tins 



(121) 



(122) 



when a = 0. 



lim 

aio l + a 



+ t 



t a - 1 s a - 1 



a 



a 



(123) 



Theorem 3.1.1. The power functions ( 11191) are reflexive and decomposable in the sense 
of (TTTTT) with 



1 L a 1+qJ 



and p Q (s) 



if a > 



(124) 



l + a' I t\nt-t "' I -Ins if a = 0. 

Moreover, this family is continuous in the parameter a J. and satisfies (I112p for x = 1 
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Proof. Decomposition (11 111) for function ip a (s,t) of (I119P into the components fl 124j) is 
clear from fl 1 2 1 [) when a > and (11221) when a = 0. The continuity in the parameter 
a I and (11121) for x = 1 follow from (I123p . We shall prove the nonnegativity and 
reflexivity. For arbitrary arguments s, t > and fixed parameters a, b > with the 
property l/a + l/6 = 1 it holds 

s a t b 

st<— + - (125) 
a o 

where = takes place iff s a = t b . Indeed, from the strict concavity of the logarithmic 
function we deduce the inequality 

1 1 / s a t b 

ln(st) = -lns a + -hit 6 < In — + - 
a b \a b 

and the stated condition for equality. Substituting s — > s a , a — > (1 + a) /a and b — > 1 + a 
for a > we get 

„l+a fl+a 

s a t < , + 

(1 + a)/ a 1 + a 

with the equality condition s aa = t b , i.e. s l+a = t 1+a . This implies that the function 
ip a (s,t) is nonnegative and reflexive. ■ 

By (11131) . (11051) and Theorem 3.1.1, the power functions (11191) generate 

1 f -±p% if a > 

r( P e) = —V% and p a (p e ) = \ ° 9 (126) 
1 + a [ -\np e if a = 0. 

and define the family of decomposable pseudodistances 

Tk P e-Po + ^Q-V a -iQ-P a e if«>0 {127 ) 
Q ■ (In q — lnpe) if a = 



in (j!17p . Relation of this family to the family of power divergences D a (Pg,Q) defined 
by ([5]) is rigorously established in the next theorem. It refers to the auxiliary family of 
functions 

128) 



(p a (s,t) = t a<p 1+a (^j + (1 - a)(p a (^j 



of arguments s, t > parametrized by a > 0. 



Theorem 3.1.2. Decomposable pseudodistances (11271) are for all (P,Q) G P <8> 'P + 
modifed power divergences D a (P, Q) and _Di +Q ,(P, Q) in the sense that the pseudodistance 
densities ip a {p,q) are weighted densities <f a (p,q) of the mixed power divergences 

J <p a (p, q) d\ Q = a D 1+a (P, Q) + (1 - o) D a (P, Q) (129) 

with the power weights w a (q) = q a , i.e. il> a (p,q) = w a (q)(p a (p,q) on (X,A). 
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Proof. By (TT28|) . 

J <p a (p,q)d\ = a J 4>i +a (p, q) dA + (1 - a) J <p a {p,q)d\ 

= a D 1+a (P, Q) + (1 - a) D a (P, Q). (130) 
By (11191) . ip a (s,t) = t a (p a (s,t) so that, by the first equality in (11271) . 



T> a (P e ,Q) = J ^ a (pe,q) dA = J w a (q)p a (p,q))d\. 

This together with fl 1 3 [) implies the desired result. ■ 

Due to Theorem 3.1.2, we call the pseudodistances 3) a (P, Q) simply power pseudo- 
distances of orders a > 0. The next theorem guarantees finiteness and continuity of 
these divergences. It is restricted to the families V satisfying for some (3 > the condition 

pP, q 13 , \np G Li(Q) for all P G V, Q G V + . (131) 

Theorem 3.1.3. If (113ip holds for some (3 > then for all < a < /3, the modified 
power divergences are well defined by (11271) and finite, satisfying for all P G V, Q G 
the continuity relation 

UmS) Q (P,Q)=2)o(P,Q). (132) 

aj.0 

Proof. By ( TT2TD . 

T) a {P 1 Q) = -—{P-p a -Q-q a ) + Q 1 ' 1 



1 + a ' \ a a 

By means of the indicator function 1 we can decompose 

p.p a = p. ( p a l(p < 1)) + p . (p a l( p > 1)) 

where 

limP • (p a l(p <1))=P- (l(p < 1)) 

aj.0 

by the Lebesgue bounded convergence theorem for integrals and 

limP • (p a l(p > 1)) = P • (l(p > 1)) 

aj,0 

by the monotone convergence theorem for integrals. Therefore 

limP-p" = P- (l(p < 1)) + P- (l(p > 1)) = 1 

Similarly, lim a jo Q • <? Q = 1- The convergences 

g a - 1 p a - 1 
lim Q ■ = Q ■ In q and lim Q ■ = Q ■ lap 

alO a a io a 
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follow from the monotone convergence as well, because for every fixed t > 

dr-i l-r(i-int) i-t a t- a 

> = — = o 



da a a 2 a 2 

so that the expressions (q a — l)/a and (p a — l)/a tend monotonically to lng and hip. 

By ( 11241) the expressions D^^Pq) considered in (11 161) . (11171) are now given by 



2)° (P e ) = j^— J pl +a d\ for all a > 0. 



Therefore the formulas (11161) . (11171) and fl 1 26[) lead to the power pseudodi stance 
estimators (briefly, minS) Q -estimators) 



^ argminj^ fp^dA-^E^M] if « > ^ 

[ argmax 9 i ^"=1 hipePQ if a = 0. 

Here the upper objective function can be replaced by 



a 1 + a I na 

J i=i 

= _!-/ ri ^ d A-Iy: rtW " 1 -i 

1 + a y e n 4^ a 



i=l 

which tends for a j to the lower criterion function. Therefore, if for a fixed n the minima 
of all functions in (11331) are in a compact subset of G and the MLE # n is unique then 

lim0 n ,« = 0„,o- (134) 

afj.0 



Example 3.1.3: L 2 -estimator revisited. 

a = 1 is defined by 

ltn = argmin e 



By (11331) . the minS) Q -estimator of order 



IdA 



2 n 

i=l 



so that it is nothing but the /^-estimator 8 n from Example 3.1.2. The family of estimators 
9 nja from (11331) smoothly connects this robust estimator with the efficient MLE 9 n fi when 
the parameter a decreases from 1 to 0. 



Remark 3.1.1. The special class of the minDa-estimators 8 a n given by (11331) was 

proposed by Basu et al. (1998) who confirmed their efficiency for a ~ and their 
intuitively expected robustness for a > 0. These authors called 8 a , n minimum density 
power divergence estimators without actual clarification of the relation of the "density 
power divergences" D a (P, Q) to the standard power divergences D a (P, Q) studied in Liese 
and Vajda (1987) and Read and Cressie (1988). Theorem 3.1.2 which explains D a (P,Q) 
as a convex mixture of modified power divergences D a (P,Q) and Di +a (P,Q) where the 
modification means weighting of the power divergence densities by the power q a of the 
second probability density, is in this respect an interesting new result. 
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Remark 3.1.2. The formula H133[) can be given the equivalent form 

i EILi i (PSW - 1) - ^ jvl +a dA if a > 



0„ 



argmax e 



if a = 0. 



(135) 



If the integral does not depend on 9 then (11351) is equivalent to 

iHU^PoiXi) ifa = 0. 



0, 



argmax e 



(136) 



This subclass of general min£) Q -estimators fl 1 3 5 [) was included in a wider family of gen- 
eralized MLE's introduced and studied previously in Vajda (1984,1986). However, the 
whole class ( 11351) was not introduced there. 

If the statistical model ((X, A); V = (Pg '■ 9 G 0)) is reparametrized by d = i)(9) then 
the new min D a -estimates $ an are related to the original 9 a<n by $ a ^ n = i?(0 a ,n)- If the 
observations x G X are replaced by y = T(x) where T : (X, A) h- > (y, B) is a measurable 
statistic with the inverse T _1 then the densities 

_ dA 



dA 



in the transformed model ((y,B);V = (Pg = PgT 1 : 9 G 0)) w.r.t. a-fmite dominating 
measure A = AT" 1 is related to the original densities pg by 



Pe(y) =Pe(T 1 y)J T (y) 



(137) 



where Jr(y) = dXT^/dX is a generalized Jacobian of the statistic T. If X, y are 
Euclidean spaces, A is the Lebesque measure and the inverse mapping H 
differentiate then Jr(y) is the determinant 



is 



My) 



dy 



H(y) 



The min ^-estimators are in general not equivariant w.r.t. invertible transformations 
of observations T, unless a = 0. The following theorem generalizes similar result of 
Section 3.4 in Basu et al. (1998). 



Theorem 3.1.4. The min S) a -estimates 9 ajJl in the above considered transformed model 
coincide with the original min2) a -estimates 9 a ^ n if the Jacobian Jr of transformation is 
a nonzero constant on the transformed observation space y. Thus if X, y are Euclidean 
spaces then the min ©^-estimators are equivariant under linear statistics Tx = ax + b. 

Proof. For a = the minD Q -estimator is the MLE whose equivariance is well known. 
For a > 0, by definition (fT33D and (fT3TD . 



9,„ 



argmmg 



argmin^ 



/ pl^d\-—f2Pe(TX t 
l + aj v na ^— ' 

^ 1=1 

t~ [pl +a JTd\-— J2P9(X l )MTX l 
1 + a J na 

i=i 
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We see by comparison with fl 133[) that 9 ajTl = 9 a , n if Jt is a nonzero constant on y. If 
a = then the estimator is MLE and its equivariance is well known. ■ 

Next we derive the influence function of the minS) a -estimators 9 a , n of (11331) . Similarly 
as in (EU), we use 



s e = -^\np e and s 9 = (^j^j s o- 

It holds 9 a ^ n = T a (P n ) where T a {Q) for Q G Q solves the equation ^f a (Q, 9) = Q-if>(x, i 
for 

= p%(x)s 9 (x)-P e -pp e - (138) 

Since t 

Mx, 9) = (J^J tf> a (x, 9) = n a , e (x) - P e ■ (U a! g + p a e s e s"e) (139) 

for 

n Q , e = pg (asgsl + s g ) , (140) 
the matrix (l2"4l is given for all Q G V + by the formula 



I a (Q) = Q ■ Il a , Ta (x) - P Ta ■ (n QiTQ +p a Ta s Ta s t Ta ) for r a = T a (Q) G 6 (141) 
In particular, 

I a {9) = I a (P e ) = -Pe • Pe-sesl (142) 



By combining (I138p . (11411) and (11421) with Theorem 1.1 and Corollary 1.1, and taking 
into account the Fisher consistency in Theorem 3.1, we obtain the following extension 
of the influence function obtained in §3.3 of Basu et al. (1998) to arbitrary observation 
spaces (X, A). 



Theorem 3.1.5. If the influence function (1211) at Q G V + or Pg G V exists for some 
min2) Q -estimator = T a (P n ) then it is given by the formula 

IF(x; T a , Q) = -I^Q)- 1 [p a Ta (x) s Ta (x) - P Ta ■ p a T j Ta ] for r a = T a (Q) (143) 

or 

IF(x; T a , 9) = -I a {9Y l \Pe(x) se{x) - P e • Vpe] (144) 

respectively. 
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3.2 Applications in the normal family 

Consider the general normal family of Example 2.1.1. By (I135p . min 2) a -estimator 9 a n = 
(^a,n, c"a,n) is the MLE given by (1641) when a = 0. Since 



(2tt(7 2 ) 1 /2 J "~ (27ra 2 ) a / 2 
we see from ( I135j) that the min2) Q -estimates are for a > given by 



{fia,n, <?a,n) = argmax 



J_ expi-ajXi- fi) 2 /2a 2 } _ (l + a)' 3 / 2 ' 
(2vra 2 ) a / 2 ~ (2vra 2 ) Q / 2 



ar g max ,, ff — > ] ( ex P <! ~ a ^^ } " (1 + a a)3 /2 ) • ( 146 ) 



Notice that in practical applications, the trivial "solutions" ([i a ,ni a a,n) = ( m aXjXj, 0) can 
be avoided by restricting the maximization to the scales bouded avay from zero. 

Example 3.2.1: Power pseudodistance estimators of location. Consider the nor- 
mal family V = {P^ : fi G K} of Example 2.1.2 where P M are given by the densities 
Pn( x ) — p{ x ~ A 4 ) f° r the standard normal density p(x). This family satisfies the condi- 



tion of the formula (11361) so that from (I133p or (I136p we obtain the minS a -estimators 
A*a,n — T a (P n ) of location /i 6 R in this family given by 



^expi-aiX,-^^} ifa>0 
Ma n = argmax., < (147) 
"1 Eli(X t -^ if« = 0. 



Equivalently, they can be obtained by inserting a = 1 in (I146p . If a = then n is the 
standard sample mean. 

The estimators of location (11471) were introduced and studied as part of larger class 
of estimators by Vajda (1986, 1989a,b). He proved that if the observations are generated 
by Quo G V + with density q(x — //o) for unimodal q(x) symmetric about x = then these 
estimators consistently estimate /j,q. For q differentiable with derivative q' he found the 
influence functions 

IF(x;T Q ,g) = ^xp{-ax 2 /2} f or a > . (148) 

Jxexp{— ax 2 /2} q'(x) dx 

This formula follows also from (11421) and (11431) where in this case 

s^x) = x - /illa,ii = P^ [a {x - fi) 2 - l] and P M • p^s^ = 0. (149) 

Indeed, f!149j) implies P^-p^s^ = and p% (x)sq(x) = x exp{— ax 2 /2}. (2n)~ a ^ 2 so that the 
numerator in f!148j) follows from (11431) . Using the identities 

P M • (n^ + pX) = J Pl +a [(1 + «) {x - /i) 2 - 1] dx = 
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and 

J xpo(x) q'(x)dx + J {po(x) + xp' (x)} q{x) dx = 
we get from (fT49l) and (fT4Tl) 

so that the denominator in ( 11481) follows from (11431) . 

The particular influence curve obtained in ( 11481) for a = 1/5 very closely and smoothly 
approximates the trapezoidal IF(x; 25A, q) of the estimator referred as the best under 
the name Hampel's choice 25A in the Princeton Robustness Study of Andrews et al. 
(1972). This study as well as the estimator of location 25 A were influential and frequently 
cited in the first decades of robust statistics. The asymptotic normality 

y/n{Va,n ~ Ho) — > N(0, al) for a l = J IF 2 (x; T a , q)q(x)dx 

in the data generating model Q^ was established in Vajda (1986, 1989a,b) too, and the 
simulations presented there demonstrated that the estimator 7\/ 5 overperformed the set 
of 6 robust estimators of location including those considered as the most prominent at 
that time. 



Example 3.2.2: Power pseudodistance estimators of scale. Consider the normal 
family V = {P a : cr > 0} of Example 2.1.3 where P a are given by the densities p a (x) = 
p(x/a)/cr for the standard normal density p(x). If a = then, by (11351) . the mm® a - 
estimator a a ^ n = T a (P n ) is the standard MLE of scale given in (1641) . Otherwise we get 
from (I146p by inserting u. = 



a. 



1 

argmax^ > 

i=l 







exp | 


2a 2 J 



a 



a > 0. 



Taking into account here 



1 n 
n ^ 

i=l 



exp 



aXj 2 
' 2a 2 



exp 



ax 
2a 2 



dPJx) 



we find more general formula 

T a (Q) = argmin CT M Q (Q, a) for QgF 

where 



M a (Q,a) = -^ J expj-^jdg(x) 



a 



By pU) and ([22]), 

ip a (x,a) 



d /r \ d 1 





ax 2 1 


exp | 


2a 2 j 



l + a) 3 / 2 " 



a 



(1 + a) 3 / 2 . 



a 



l+a 



a* 



1 I exp 



ax 
2a 2 



a 



'l + a 



,3/2 



(150) 



151) 
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The last formula will be used to evaluate the influence function. Before doing so we shall 
verify it by checking the Fisher consistency condition 



Pas • ipa{x, cr) = if and only if a = cr 

guaranteed by Theorem 3.1. We shall use the substitutions 

a a a a 



On 



a 



an 



and the formula 



Then 



cxp 



ax 
'2a~ 2 



P 



TO 



O"0 



-Pa 



x- 



1 J exp 



Gt OC | / \ i ^ ex. I I *^ 

p ao {x)dx = — / — 
ctq J V " 



2a 2 



1 p Sa {x)dx 



(152) 



(153) 



(154) 



where 



Ssy { S 



(7 \o 



- 1 



(1 - a) - (o/o ) 
(a/a )(l + (a /a)a)V 2 



(1 - a) - (a/a ) a _ Q 



(a/a )(l + (a /a)a)3/2 (1 + a f 2 
if and only if a = a, which positively verifies (j!5ip . 
From f)15ip we get 



Mx,a) = —ip a (x,a) = — 
da (2 



1 1 



cxp 



ax 
2a~ 2 



x 



a 



?ll-(3 + 2a) 



(7 



a 



a 



VT+a 



Denoting for brevity as before 



r a = T a (Q) for QeV" 



(155) 



we obtain from (I15ip . f II 5 5 j) and Theorem 1.1 the influence functions of the min2D c 
estimators cr ara = T a (P n ) at Q for all a > in the form 



lF(x;T a ,Q) 



lf> a (x,T a ) 



a 

(x, T a )dQ 

a 



r a {Q) 



exp 



21 / 2 

ax I x 



2t 2 It, 2 



-T-l + 



a 



[1 + a 



,3/2 



(156) 
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where T a (Q) denotes the integral 



cxp 



ax 
'2^2 



a 



x 



(3 + 2a) 



x 



+ 1 + a 



a 



VI + a 



dQ. (157) 



For Q = P a the Fisher consistency implies r a := T a (P^) = a so that (j!56p and (11571) 
imply 



IF fx; T a , a) 



a 



X 



— — 1 J exp 



ax 



where the integral T a (P a ) reduces to 



2a 2 



x 



VX V> 1 -tSt \ \ a f^J ~(3 + 2a)^-) +l + n 



1 



3a 



l + a) 



3 + 2a 
1 + a 



+ 1 + a 



a 



VI + a 



VI + a 

1 a 2 + 2 



3a-(3 + 2a)(l + a) + (l + a) 



;i + «r 

a 2 + 2 



VI + " (1 + a) 
Hence for all a > 



l + a 



,5/2' 



lF(x;T a ,a) 



l + a) 5/2 a 
a 2 + 2 



x 



a 



1 exp 



ax 
'2^2 



a 



[l + a 



3/2 



VI + a 



p CT (x)dx 



'l + a 



.3/2 



158) 



Conclusion 3.2.1 The min ©^-estimators <j Q n = T a (P n ) of normal scale are for all 
a > robust in the sense that their absolute sensitivity to the observations x G R 
represented by 



sup |IF(x; T a , cr)| = max {— IF(0; T Q , a), IF (a a ;T a ,a)} for <j a = cr\ — - — 

xeK V a 

is bounded (cf. Hampel et al. (1986)). However, they are not insensitive against extreme 
outliers because 

lim W(x; T a , a) = IF (a; T a , a) = a ( 1 + a ^ , (159) 

|x|->oo a A + 2 



3.3 Renyi pseudodistance estimators 

In this subsection we propose for probability measures P G V and Q G 7 ?+ considered 
in the previous sections a family of pseudodistances fR a (P,Q) of a Renyi type of orders 
a > which are not of the integral type as 2)^(P, Q) of (TT05i) or £> a (P, Q) of (1T27I) . Our 
proposal is based on the following theorem where 



K(p) 



l + a 



\n(P-p a ) and d\\(Q) 



1 

a(l + a 



HQ-q a ). 



(160) 
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Theorem 3.3.1. Let the condition (11311) hold for some (5 > 0. Then for all < a < (3 

X a (P, Q) = -i- In (P • p a ) + 1 ln(Q • q a ) - - ln(Q • p a ) (161) 
1 + a a(l + a) a 

is a family of pseudodistances decomposable in the sense 

vtaip, Q) = K(p) + K(Q) - - HQ ■ p a ) (162) 

a 

for 9^° (P), £H„(Q) given by (11601) . and satisfying the limit relation 

9U-P,Q) ^^o(P,g) :=ging-ginp for a | 0. (163) 



Proof. Under (I13ip . the expressions ln(Q • q a ), \n(Q -p a ) and Q ■ hip appearing in (11611) 
are finite so that the expressions 9K a (P, Q) are well defined by (11611) . Taking a > and 
substituting 



P 



7 , 

and a = , o = 1 + a 



a 



(Jp^d\) 1/b, {Jq b d\) 1/b 
in the inequality (I125p . and integrating both sides by A, we obtain the Holder inequality 



P a qd\ < I / p 1+a dA 



a/{l+a) 



q 1+a dX 



l/(l+a) 



with the equality iff p aa = q b A-a. s., i.e. iff p = q A-a. s. Since the expression (11611) 
satisfies for a > the relation 



£R a (P,Q) = -<Hn 
a 



p 1+a dX 



a/(l+a) 



q 1+a dX 



1/(1+")' 



In / p a qdX } , (164) 



we see that fR a (P, Q) is pseudodistance on the space V ®V + . The decomposability in the 
sense of (11621) on this space is obvious and the limit relation 

9to(P,Q) =lim*K a (P,Q) 

aj.0 

can be proved in a similar manner as in the proof of Theorem 3.1.3. ■ 

There is some similarity between the decomposable pseudodistances fR a (P,Q), a > 
of (11611) and the Renyi divergences 



R«(P,Q) 



l 



a — 1 



In (Q ■ (p/q) a ) , a > (cf. Renyi (1961). 



Namely, rewriting the formula (I164p into the form 



JUP, Q) = _U in + -JU in « ' 9< 



a + 1 a(a + l) Q ■ p° 
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and replacing the ratios of expectations by the expectations of ratios, we get for a > 
the relation 

X a (P,Q) = — L-ln(Q ■ (p/q)) + . 1 r HQ ■ {q/p) a ) = -^—R a+1 (Q,P) (165) 
a + 1 a (a + 1) a + 1 

which can be extended to a = by taking on both sides the limits for a J, 0. Therefore 
the decomposable pseudodistances (I16ip are modified Renyi divergences and as such, they 
are called Renyi pseudodistances . 

Similarly as earlier in this section, we are interested in the estimators obtained by 
replacing the hypothetical distribution Pg in the 9^ Q -pseudodistances 9K a (Pg, Pg ) by the 
empirical distribution P n . In other words, we are interested in the family of Renyi 
pseudodi stance estimators of orders < a < (3 (in symbols, min D^-estimators) 
defined as 9 n>a = T a (P n ) for T a (Q) e with Q e Q = V + UPemp satisfying the condition 

( argmm ej ^\n(Pg-p%)-±HQ-P9) if < a < (3 
T a (Q) = < (166) 
[ arg min fl — in Q ■ p e it a = (J. 

The upper formula is for 

Cg(a) = (P e • ^) a/(1+a) = (J p] + "&\\ "'^ (167) 

equivalent to 

Alternatively, we can write 



T a (Q) = arg max M a (Q,6) for M a (Q,6) = Q^L (i 6 8) 
e Cg(a) 



\ argmax e C e («) - J2i=iPe( x i) ii < a < (3 

Vn,a = < , (169) 

[ argmax e - ]P. =1 lnp e (Xj) if a = 0. 

For a ~ j the approximations Cg(a) ~ 1 and 



a \ n / n a n *—f 

\ i=l / i=l i=l 

indicate that the upper criterion function in (I169p tends to the lower MLE criterion for 
a I 0. If C$(a) does not depend on 9 then the min ^-estimates reduce to the min 
£) Q -estimates considered in (11361) of Remark 3.1.2, i.e., 

a J JEILi^W ifo<«</3 

& Qn = argmax fl < , 170 
UElUln^pQ) if« = 0. 

If the extremal points of all functions in (11691) are in a compact set of G then 

lim0 n , a = 6 nfi . (171) 
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In the next theorem and its proof we use the auxiliary expressions 
d f d \ ^ 

Se = ~dQ lnpe > ^ e= \de) Se ( cf - (G3D) 

and 

/ x f Pg +a sgd\ „ / d \ . . _ _ 

= jp 1+a d\ ' ° = / Ta = 

Theorem 3.3.3. If the influence function ( 1211) at Q G V + or Pg G V exists for some 
min Dt^-estimator 6 a ^ n = T a (P n ) then it is given by the formula 

lF(x; T a , Q) = -I a (Q)- 1 \p Ta {x) {s Ta (x) - c Ta {a))\ (172) 

or 

IF(x; T a , 9) = -I^O)- 1 \p e {x) (sg(x) - c e {a))} (173) 

for the matrices 

I a (Q) = J [s Ta ~ KM - ap a Ta (s Ta - c r ») (s Ta - c Ta (a)) 4 ] tfdQ (174) 

I a (0) = J [h - c e (a) - apg (s e - c e {a)) (s e - cg(a)f] pl +a d\ (175) 
respectively. 

Proof. By (11681) . T a (Q) for Q G Q minimizes Q ■ (pg/Cg(a)) , i.e. solves the equation 
**{Q,0) = Q ■ 1>(x,B) = for 

* a (M) = MS x ,0) = * -if- = 

dOC e (a) Cg(a) 

Further, 

dV 



or 



Cff(a) := C^a) = aC e (a)4(a) 



so that 



_ C e {a) [a 2 Pgsl (sg - cg(a)) + ap^ (s g - c e (a))} - ap^ (s$ - c e (a)) Cg(a) 

Cg(a) 

_ <y 2 PgSg {Sg - Cg(a)) + ap% (sg - Cg(a)) - Q?p%s\ (sg - Cg(a)) Cg(«) 

Cg{a) 

Therefore the matrix (1241) is given for all Q G V + by the formula (11741) and (127]) is given 
for P# G "P by (j!75j) . The rest is clear from Theorems 1.1 and 3.1, and from Corollary 
1.1. ■ 
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3.4 Applications in the normal family 

Consider the general normal family of Example 2.1.1 for which the condition f II 3 1 j) is 
satisfied for all (3 > and (I145P implies 

/fl + a)-^ 2 \ a/(1+a) (T -° 2 /(i+«) 

c "M= c -< a Hi^w = ^r (177) 

for all /i G IR and the function 

c(a) = [(l + a) (2vr) Q ] a/2(1+a) , a > 0. 

By f )169p . the min ^-estimator = (yU Q ,n, o"«,n) is the standard estimator of location 
and scale given by ([Ml) if a = 0. For «>0we can use the relation 

Z = o--«/(l-H») 

to get from f )169D and (11771) the highly nonstandard estimator 



{fJ>a,n, <Ta,n) = argmax 



[1,(7 



na a/(i +Q )Z^ ex Pi a 2a 2 



(178) 



which in general differs from the min2) Q -estimator (I146P as it will be seen in the submodel 
of scale below. Similarly as in the case of power pseudodistance estimator (11461) . the 
trivial "solutions" (p a ,m cr a>n ) = (maXjXj, 0) can be avoided in practical applications by 
restricting the maximization to the scales bouded avay from zero. 

The next example of the submodel of location illustrates the situation where these 
two estimators coincide. Obviously, the constants c a = c(a)/(27r) Q/ ' 2 play no role in the 
maximization and can be replaced by 1. 



Example 3.4.1: Renyi pseudodistance estimators of location. The normal family 
of location introduced in Example 2.1.2 satisfies the condition of the formula (11361) so that 
from (I133p or (I136p we obtain the same min D^-estimators /i a>n of location fi G IR as in 



(I147p . Thus to these estimators applies all what was seen in Example 3.3.1. 

Example 3.4.2: Renyi pseudodistance estimators of scale. Consider the normal 
model of scale introduced in Example 2.1.3. If a = then, by (11351) . the min O^-estimator 
c"a,n = T a (P n ) is the standard MLE of scale given in fl64|) . Otherwise by (I178p . 



a, 



argmax^ 



a/(l+a) 



—a 



2a 2 



a>0 (cf. (dZHD). 



(179) 



It is easy to see e.g. by putting n = 1 and aX 2 = 2 that these estimates differ from the 
£) a -estimates of scale given in (II 501) . Here (I168p for the Dirac 5 X implies 



M a (5 x ,a) 



CJa) 



a 



a/(l+a) 



exp 



ax 
'2^2 
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and by (l20"j) and (l22j). 



ip a (x, a) = -^-M a (5 x , a) = c a ^- 
da dcr 



a -a/(l+a) exp 



ax 
2^ 



Cr, 



a 



a/(l+a) 



ax 



a 1 
1 + a a 





ax 2 1 


exp | 


2a 2 J 



r l+a/(l+a) 



2\ 2 1 

cr/ 1 + a 





ax 2 1 


exp | 


2a 2 J 



180) 



This formula can be verified by checking the Fisher consistency known in general from 
Theorem 3.1. Using the formulas ( 11531) and ( 1154[) we find 



cr/ 



1 + a 





cvx 2 | 


exp | 


2a 2 J 



p ao (x)dx 



a 



<7 Z + CKTn 



(J 2 + CVCTn 



1 + a 



Since the right-hand side is zero if and only if a = cr , the verification is positive. 
From (I180p we evaluate after some effort the derivative 



exp 



ax 
2^ 



a 



x\ 2 

n | - - — — 

cr/ 1 + a 



ac n 



exp 



2 "| 



x 



where 



■x\ 4 5a + 3 /x\ 2 2a + 1 



a 



+ 



(181) 



>cr/ 1 + a Vcr/ (1 + a) 
Thus, denoting for brevity 

r a = T a (Q) for QeV + 

we obtain from f)180p . (jlBip and Theorem 1.1 the influence functions of the mmD a - 
estimators a a ^ n = T a (P n ) at Q given for all a > by 

ll) a {x,T a ) 



lF(x;T a ,Q) 



)dQ 



a 



r a (Q) 



where 



T a (Q) 



x 

//,, [ - ) exp 



1 + a 



ax 



2r 2 



exp 



dQ. 



ax 
"2^ 



182) 



In the special case Q = P a the Fisher consistency implies that r Q := Tq^Po-) = a. We use 
the relation 

ax 2 1 / s 1 „ (7 

"2 fVaix) =Pa a [x) 



exp 



2cr 2 



for cr f 



v / TTa 
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to obtain 



T a (P a ) = J- — / Va(-) Pa a (x)dx 

y/l + aj V<7/ 



l + a 



1/2 



a 



Or 



a 



5a + 3 /<J a \ 1 
l + a V u 



2a 



[l + a) 



[l + a 



,5/2 



[3a - (5a + 3) + 2a + 1] 



l + a 



,5/2 



independently of a > 0. Therefore at the normal location P CT we get for all a > the 
influence functions 



W(x;T a ,P a ) 



'l + af 2 o 



a J 



1 



a 



exp 



ax 
2^ 



183) 



It is easy to verify that this is the influence function also in the MLE case a = 0. 



Conclusion 3.4.1. The min !>K Q -estimators a a ^ n = T a (P n ) of normal scale are for all 
a > robust in the sense that their influence functions are bounded. They are more 
robust against distant outliers than the corresponding min£) Q ,-estimators studied in the 
Subsections 3.1 and 3.2 because 



lim IF(x; T aj P a ) = (cf. ([182])). 



(184) 



Problem 3.4.1. Compare by simulations the mean squared errors of the min 2) Q -estimators 
and min D^-estimators of location in contaminated normal scale models 



for 



[l-e)P a + eQ c 



< £ < 1/2 and Q G {Ps,Pio, Logistic, Cauchy} . 



185) 



186) 



Verify in this manner the stronger robustness of the min D^-estimators theoretically jus- 
tified in the Conclusion 3.4.1. 



Acknowledgement This research was supported by the grants GACR 102/07/1131 
and MSMT 1M 0572. The authors thank the PhD student Iva Frydlova for careful reading 
and corrections of many previous versions of the first two sections. They thank also to 
the MSc student Radim Demut for simulations of the Renyi estimators in contaminated 
families. The very promising results obtained by him encouraged the theoretic research 
presented here. 



37 



References 



[1] D. F. Andrews, P.J. Bickel, F. R. Hampel, P. J. Huber, W. H. Rogers and J. W. 
Tukey (1972). Robust Estimates of Location. Princeton University Press, Princeton 
N. J. 

[2] A. Basu, I. R. Harris, N.L. Hjort and M. C. Jones (1998). "Robust and efficient 
estimation by minimizing a density power divergence," Biometrika, vol. 85, No. 3, 
pp. 549-559. 

[3] M. Broniatowski and A. Keziou (2006). "Minimization of 0-diveregnces on sets of 
signed measures," Studia Scientiarum Mathematica Hungarica, vol. 43, pp. 403-442. 

[4] M. Broniatowski and A. Keziou (2009). "Parametric estimation and tests through 
divergences and the duality technique," Journal of Multivariate Analysis, vol. XX, 
pp. ABC-ABD. 

[5] F. R. Hampel, E. M. Ronchetti, P. J. Rousseuw and W. A. Stahel (1986). Robust 
Statistics: The approach Based on Influence Functions, New York: Willey. 

[6] F. Liese and I. Vajda, (1987). Convex Statistical Distances, Leipzig: Teubner. 

[7] F. Liese and I. Vajda, (2006). "On divergences and informations in statistics and 
information theory," IEEE Transactions on Information Theory, vol. 52, No. 10, pp. 
4394-4412. 

[8] C. Miescke and F. Liese (2008). Statistical Decision Theory, Berlin: Springer. 

[9] M. R. C. Read and N. A. C. Cressie (1988). Goodness-of-Fit Statistics for Discrete 
Multivariate Data, Berlin: Springer. 

[10] A. Renyi (1961). "On measures of entropy and information," Proc. 4-th Berkeley 
Symp. on Probability and Statistics, vol. 1, pp. 547-561. Berkeley: University of 
California Press. 

[11] A. Toma, M. Broniatowski (2008). "Minimum divergence estimators and tests: Ro- 
bustness results," submitted. 

[12] I. Vajda, (1984). Minimum divergence principle in statistical estimation. Statistics 
and Decisions. Suppl. Issue No.l, pp. 239-261. 

[13] I. Vajda, (1986). Efficiency and robustness control via distorted maximum likelihood 
estimation. Kybernetika, vol. 22, pp. 47-67. 

[14] I. Vajda, (1989a). Comparison of asymptotic variances for several estimators of lo- 
cation. Problems of Control and Information Theory, vol. 18, No. 2, pp. 79-89. 

[15] I. Vajda, (1989b). Estimators asymptotically minimax in wide sense. Biometrical 
Journal, vol. 31, No. 7, pp. 803-810. 



38 



[16] I. Vajda, (2008). Modifications od Divergence Criteria for Applications in Continuous 
Families. Research Report No. 2230, Institute of Information Theory and Automa- 
tion, Prague, November 2008. 

[17] A. W. van der Vaart and J. A. Wellner (1996). Weak Convergence and Empirical 
Processes, Berlin: Springer. 

[18] A. W. van der Vaart (1998). Asymptotic Statistics. Cambridge University Press, 
Cambridge. 



39 



